Skip to content

Fix Dawn nightly: select SDPA attn output by shape, not numel (#20283)#20283

Merged
meta-codesync[bot] merged 1 commit into
pytorch:mainfrom
JulianCloudNTH:export-D108625761
Jun 15, 2026
Merged

Fix Dawn nightly: select SDPA attn output by shape, not numel (#20283)#20283
meta-codesync[bot] merged 1 commit into
pytorch:mainfrom
JulianCloudNTH:export-D108625761

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary:

The WebGPU Dawn native nightly (webgpu_native_test) fails deterministically on the llama1b_prefill SDPA config with FAIL: ambiguous attention output: 3 tensors match numel 262144, which fails the binary and turns the job red.

sdpa_with_kv_cache returns three tensors [k_cache, v_cache, attn_output]. test_sdpa_config identified the attention output purely by element count (numel == S*Hq*D). For llama1b_prefill (Hq=32, Hkv=8, D=64, S=128, Cmax=512) the attention count S*Hq*D = 128*32*64 = 262144 coincides exactly with each cache count Cmax*Hkv*D = 512*8*64 = 262144, so all three outputs match numel and the existing ambiguity guard correctly bails before any numeric comparison. The kernel output itself is fine -- the sibling llama1b_decode config (same Hq/Hkv/D) passes at ~1e-9; only the test's output-selection heuristic was wrong. The colliding config and the numel selector were introduced together in D107595144.

Fix: disambiguate by shape instead of flat count. The attention output is [1, S, Hq, D] while each cache is [1, Cmax, Hkv, D]; these differ in dims 1-2 even when the flat count collides. Match dim()==4 && size(1)==S && size(2)==Hq && size(3)==D, keeping the attn_matches > 1 ambiguity guard as a backstop.

Scope: test-only, one function (test_sdpa_config); no kernel, runtime, or export change.

Authored with Claude Code.

Reviewed By: Gasoonjia

Differential Revision: D108625761

@pytorch-bot

pytorch-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20283

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2026
@meta-codesync

meta-codesync Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@JulianCloudNTH has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108625761.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot changed the title Fix Dawn nightly: select SDPA attn output by shape, not numel Fix Dawn nightly: select SDPA attn output by shape, not numel (#20283) Jun 15, 2026
JulianCloudNTH added a commit to JulianCloudNTH/executorch that referenced this pull request Jun 15, 2026
…h#20283)

Summary:

The WebGPU Dawn native nightly (`webgpu_native_test`) fails deterministically on the `llama1b_prefill` SDPA config with `FAIL: ambiguous attention output: 3 tensors match numel 262144`, which fails the binary and turns the job red.

`sdpa_with_kv_cache` returns three tensors `[k_cache, v_cache, attn_output]`. `test_sdpa_config` identified the attention output purely by element count (`numel == S*Hq*D`). For `llama1b_prefill` (`Hq=32, Hkv=8, D=64, S=128, Cmax=512`) the attention count `S*Hq*D = 128*32*64 = 262144` coincides exactly with each cache count `Cmax*Hkv*D = 512*8*64 = 262144`, so all three outputs match `numel` and the existing ambiguity guard correctly bails before any numeric comparison. The kernel output itself is fine -- the sibling `llama1b_decode` config (same `Hq/Hkv/D`) passes at `~1e-9`; only the test's output-selection heuristic was wrong. The colliding config and the numel selector were introduced together in D107595144.

Fix: disambiguate by shape instead of flat count. The attention output is `[1, S, Hq, D]` while each cache is `[1, Cmax, Hkv, D]`; these differ in dims 1-2 even when the flat count collides. Match `dim()==4 && size(1)==S && size(2)==Hq && size(3)==D`, keeping the `attn_matches > 1` ambiguity guard as a backstop.

Scope: test-only, one function (`test_sdpa_config`); no kernel, runtime, or export change.

Authored with Claude Code.

Reviewed By: Gasoonjia

Differential Revision: D108625761
…h#20283)

Summary:

The WebGPU Dawn native nightly (`webgpu_native_test`) fails deterministically on the `llama1b_prefill` SDPA config with `FAIL: ambiguous attention output: 3 tensors match numel 262144`, which fails the binary and turns the job red.

`sdpa_with_kv_cache` returns three tensors `[k_cache, v_cache, attn_output]`. `test_sdpa_config` identified the attention output purely by element count (`numel == S*Hq*D`). For `llama1b_prefill` (`Hq=32, Hkv=8, D=64, S=128, Cmax=512`) the attention count `S*Hq*D = 128*32*64 = 262144` coincides exactly with each cache count `Cmax*Hkv*D = 512*8*64 = 262144`, so all three outputs match `numel` and the existing ambiguity guard correctly bails before any numeric comparison. The kernel output itself is fine -- the sibling `llama1b_decode` config (same `Hq/Hkv/D`) passes at `~1e-9`; only the test's output-selection heuristic was wrong. The colliding config and the numel selector were introduced together in D107595144.

Fix: disambiguate by shape instead of flat count. The attention output is `[1, S, Hq, D]` while each cache is `[1, Cmax, Hkv, D]`; these differ in dims 1-2 even when the flat count collides. Match `dim()==4 && size(1)==S && size(2)==Hq && size(3)==D`, keeping the `attn_matches > 1` ambiguity guard as a backstop.

Scope: test-only, one function (`test_sdpa_config`); no kernel, runtime, or export change.

Authored with Claude Code.

Reviewed By: Gasoonjia

Differential Revision: D108625761
@meta-codesync meta-codesync Bot merged commit a9dd615 into pytorch:main Jun 15, 2026
175 of 180 checks passed
@JulianCloudNTH JulianCloudNTH deleted the export-D108625761 branch June 15, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants